Developer(s) | Johannes Söding |
---|---|
Stable release | 1.5.1 / 23 October 2008 |
Written in | C++ |
Available in | English |
Type | Bioinformatics tool |
License | Creative Commons Attribution-NonCommercial-2.0 |
Website | ftp://toolkit.lmb.uni-muenchen.de/HHsearch/ |
HHsearch is a program for protein sequence searching that is free for non-commercial use.[1] HHpred is a free protein function and protein structure prediction server based on the HHsearch method.[2] HHpred/HHsearch are among the most popular methods for protein structure prediction and the detection of remotely related sequences, having been cited over 700 times.[3]
Contents |
Sequence searches are frequently performed by biologists to infer the function of an unknown protein from its sequence. For this purpose, the protein's sequence is compared to the sequences of other proteins in public databases and its function is deduced from those of the most similar sequences. Often, no sequences with annotated functions can be found in such a search. In this case, more sensitive methods are required to identify more remotely related proteins or protein families. From these relationships, hypotheses about the protein's functions, structure, and domain composition can be inferred. HHsearch performs searches with a protein sequence through databases. The HHpred server and the HHsearch software package offer many popular, regularly updated databases, such as the Protein Data Bank, as well as the InterPro, Pfam, COG, and SCOP databases.
HHpred/HHsearch belongs to the class of profile-profile comparison tools, which includes the most sensitive sequence search methods to date.[1][4][5][6] They represent both the query sequence and the database sequences by sequence profiles, also called position-specific scoring matrices (PSSMs). Profiles are calculated from a multiple sequence alignment of related sequences which are typically collected using the PSI-BLAST program from the National Center for Biotechnology Information (NCBI). A profile is a matrix containing for each position in the query sequence the similarity score for the 20 amino acids. These scores are calculated from the frequencies of the amino acids at the corresponding positions in the multiple sequence alignment. Because profiles contain much more information than a single sequence (e.g. the position-specific degree of conservation), profile-profile comparison methods are much more powerful than sequence-sequence comparison methods like BLAST or profile-sequence comparison methods like PSI-BLAST.[4]
HHpred/HHsearch represents query and database proteins by profile hidden Markov models (HMMs), an extension of sequence profiles which also record position-specific amino acid insertion and deletion frequencies. HHsearch searches a database of HMMs with a query HMM. Before starting the search through the actual database of HMMs, HHsearch/HHpred builds a multiple sequence alignment of related sequences using a context-specific version of PSI-BLAST, called CSI-BLAST. From this alignment, a profile HMM is calculated. The databases contain HMMs that are precalculated in the same fashion using PSI-BLAST. The output of HHpred and HHsearch is a ranked list of database matches (including E-values and probabilities for a true relationship) and the pairwise query-database sequence alignments. A search through the PDB database of proteins with solved 3D structure takes a few minutes. If a significant match with a protein of known structure (a "template") is found in the PDB database, HHpred allows to build a homology model using MODELLER software, starting from the pairwise query-template alignment.
Applications of HHpred/HHsearch include protein structure prediction, function prediction, domain prediction, domain boundary prediction, and evolutionary classification of proteins.
HHpred servers have been ranked among the best servers during the last three CASP blind protein structure prediction experiments. In the last CASP, CASP9, HHpredA, B, and C were ranked 1st, 2nd, and 3rd out of 81 participating automatic structure prediction servers in template-based modeling[7] and 6th, 7th, 8th on all 147 targets, while being much faster than the best 20 servers.[8] In CASP8, HHpred was ranked 7th on all targets and 2nd on the subset of single domain proteins, while still being more than 50 times faster than the top-ranked servers.[9]